cosmos: handle HTTP 403/sub-status 5300 (AAD_REQUEST_NOT_AUTHORIZED) by refreshing bearer token and retrying#46167
cosmos: handle HTTP 403/sub-status 5300 (AAD_REQUEST_NOT_AUTHORIZED) by refreshing bearer token and retrying#46167
Conversation
…and async) When Cosmos DB returns HTTP 403 with sub-status 5300 (AAD_REQUEST_NOT_AUTHORIZED), the cached bearer token is now cleared and the request is retried with a fresh token. This mirrors how the base class handles HTTP 401, and resolves the issue where long-running services using managed identity would permanently fail after token expiry. - Added send() override to CosmosBearerTokenCredentialPolicy (_auth_policy.py) - Added send() override to AsyncCosmosBearerTokenCredentialPolicy (_auth_policy_async.py) - Added unit tests for both sync and async policies Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/a5381531-6292-4e5e-be43-586d3267d980 Co-authored-by: bambriz <8497145+bambriz@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/a5381531-6292-4e5e-be43-586d3267d980 Co-authored-by: bambriz <8497145+bambriz@users.noreply.github.com>
Replace MagicMock-based policy isolation tests with full azure-core Pipeline/AsyncPipeline tests that use a real MockTransport (extending HttpTransport/AsyncHttpTransport). This approach: - Returns proper requests.Response objects with real header dicts, including x-ms-substatus for the 403/5300 trigger - Records all outgoing requests so tests can assert on the exact Authorization header value (type=aad&ver=1.0&sig=<token>) - Verifies both the Cosmos AAD header format and token freshness end-to-end through the full policy pipeline, matching the pattern used in azure-core's own authentication tests Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/a67aa68c-a3b6-42ef-824f-aa45415226d6 Co-authored-by: bambriz <8497145+bambriz@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/6a43abe4-336d-48bb-9181-24523f9dc925 Co-authored-by: bambriz <8497145+bambriz@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot update changelog to mention the pr and the fix applied. |
Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/50ccc6e6-b671-434d-97cc-9469276b13da Co-authored-by: bambriz <8497145+bambriz@users.noreply.github.com>
Done in commit Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
There was a problem hiding this comment.
Pull request overview
This PR updates the Cosmos AAD bearer token auth policies to recover from a Cosmos-specific auth failure pattern (HTTP 403 with sub-status 5300 / AAD_REQUEST_NOT_AUTHORIZED) by clearing the cached token and retrying once, and adds unit coverage for both sync and async pipelines.
Changes:
- Add 403/5300 detection in both sync and async Cosmos bearer token credential policies to refresh token and retry once.
- Add new sync/async unit tests using
Pipeline/AsyncPipelinewith mock transports to validate retry behavior and Authorization header formatting. - Add a changelog entry documenting the bug fix.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/cosmos/azure-cosmos/azure/cosmos/_auth_policy.py | Adds sync send() override to retry once on 403 + substatus 5300 after clearing cached token. |
| sdk/cosmos/azure-cosmos/azure/cosmos/aio/_auth_policy_async.py | Adds async send() override with the same 403/5300 retry behavior. |
| sdk/cosmos/azure-cosmos/tests/test_auth_policy_unit.py | New sync unit tests verifying retry and Cosmos AAD Authorization header format. |
| sdk/cosmos/azure-cosmos/tests/test_auth_policy_unit_async.py | New async unit tests verifying retry and Cosmos AAD Authorization header format. |
| sdk/cosmos/azure-cosmos/CHANGELOG.md | Documents the 403/5300 retry fix under Bugs Fixed. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| :return: The pipeline response object | ||
| :rtype: ~azure.core.pipeline.PipelineResponse | ||
| """ | ||
| retry_key = "_cosmos_aad_403_retried" |
There was a problem hiding this comment.
@copilot What is the point of this retry_key? It is causing a TypeError: Session.request() got an unexpected keyword argument '_cosmos_aad_403_retried'
in test_aad_credentials
Causing tests to fail.
There was a problem hiding this comment.
Fixed in commit 36cd3987. The retry_key was being stored in request.context.options, which azure-core forwards as keyword arguments to the underlying Session.request() call — causing the TypeError. The flag is now stored directly in request.context (the PipelineContext dict), which is designed for pipeline-level metadata and is never forwarded to the transport.
…ntext.options Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/03081b79-9c5f-4572-b475-45d1ba581109 Co-authored-by: bambriz <8497145+bambriz@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
AsyncCosmosBearerTokenCredentialPolicydoes not handle HTTP 403 with sub-status 5300 (AAD_REQUEST_NOT_AUTHORIZED) - only 401 is handledsend()override toAsyncCosmosBearerTokenCredentialPolicyin_auth_policy_async.pyto clear cached token and retry on 403/5300send()override to syncCosmosBearerTokenCredentialPolicyin_auth_policy.pyfor the same fixPipeline/AsyncPipelinewithMockTransportthat returns properrequests.Responseobjects with headersAuthorizationheader format (type=aad&ver=1.0&sig=<token>) in both initial and retry requestsTypeErrorintest_aad_credentials: store retry flag inrequest.context(dict) instead ofrequest.context.options(forwarded to transport as kwargs)